We will continue working with the same subset of data from the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany as in the last exercise. If you have saved it while working on the previous exercise, you should be able to load it with the following command:

corona_survey <- readRDS("../data/gp_corona_subset.rds")

If you have not saved the dataframe, you need to run the full wrangling pipeline from the beginning of the previous exercise again.

In this exercise, we will look at more ways to (visually) explore the data.

1

To start with, plot a bar chart showing the absolute frequencies (counts) for the age_cat variable (using ggplot2) with different colors for the different age groups.
The geom we need for this is geom_bar. This geom only requires the mapping of an x aes(thetic), but in this case we also want to specify a fill.
library(tidyverse)

corona_survey %>% 
  ggplot(aes(x = age_cat, 
             fill = age_cat)) +
  geom_bar()

2

Next, let’s create a plot to show differences in party preferences between men and women in the sample with different colors for the parties.
We, again, want to use a bar chart for this. However, for group comparisons, we need to define the position of the bars to create a grouped bar plot.
corona_survey %>%
  filter(!is.na(choice_of_party)) %>%
  ggplot(aes(x = sex, 
             fill = choice_of_party)) +
  geom_bar(position = "dodge")

3

Now, we want to visually explore how much (some of the) summary statistics for the perceived personal risk of becoming infected with the Coronavirus differs between the age groups. However, we also want to look at the (jittered) individual data points.
We can use a boxplot for this. To display the (jittered) individual data points we need geom_jitter() in addition to the geom_boxplot().
corona_survey %>%
  filter(!is.na(risk_self)) %>% 
  ggplot(aes(x = age_cat, 
             y = risk_self)) +
    geom_boxplot() +
    geom_jitter()

4

We also want to explore the amount of missing data we have for some of our variables. Using a function from the naniar package, create a plot showing the percentage of missing values for the variables on trust in different people and institutions.
After selecting the variables, we need to use the gg_miss_var() function.
library(naniar)

corona_survey %>% 
  select(starts_with("trust")) %>% 
  gg_miss_var(show_pct = TRUE)

5

Finally, we want to use a function from the GGally package to create a compact visualisation of the distributions and relationships of the following variables: age_cat, education_cat, risk_self, risk_surround, sum_measures, mean_trust.
The function we’re looking for is in another castle ggpairs().
library(GGally)

corona_survey %>% 
  select(age_cat,
         education_cat,
         risk_self,
         risk_surround,
         sum_measures,
         mean_trust) %>% 
  ggpairs()